34 research outputs found

    SICLE: A high-throughput tool for extracting evolutionary relationships from phylogenetic trees

    Full text link
    We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high- throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/.Comment: 8 pages, 4 figures in journal submission forma

    Highlights from the eleventh ISCB Student Council Symposium 2015

    Get PDF
    This report summarizes the scientific content and activities of the annual symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Intelligent Systems for Molecular Biology (ISMB) / European Conference on Computational Biology (ECCB) conference in Dublin, Ireland on July 10, 2015

    Epigenetic Regulation of MicroRNA Genes and the Role of miR-34b in Cell Invasion and Motility in Human Melanoma

    Get PDF
    Invasive melanoma is the most lethal form of skin cancer. The treatment of melanoma-derived cell lines with 5-aza-2\u27-deoxycytidine (5-Aza-dC) markedly increases the expression of several miRNAs, suggesting that the miRNA-encoding genes might be epigenetically regulated, either directly or indirectly, by DNA methylation. We have identified a group of epigenetically regulated miRNA genes in melanoma cells, and have confirmed that the upstream CpG island sequences of several such miRNA genes are hypermethylated in cell lines derived from different stages of melanoma, but not in melanocytes and keratinocytes. We used direct DNA bisulfite and immunoprecipitated DNA (Methyl-DIP) to identify changes in CpG island methylation in distinct melanoma patient samples classified as primary in situ, regional metastatic, and distant metastatic. Two melanoma cell lines (WM1552C and A375 derived from stage 3 and stage 4 human melanoma, respectively) were engineered to ectopically express one of the epigenetically modified miRNA: miR-34b. Expression of miR-34b reduced cell invasion and motility rates of both WM1552C and A375, suggesting that the enhanced cell invasiveness and motility observed in metastatic melanoma cells may be related to their reduced expression of miR-34b. Total RNA isolated from control or miR-34b-expressing WM1552C cells was subjected to deep sequencing to identify gene networks around miR-34b. We identified network modules that are potentially regulated by miR-34b, and which suggest a mechanism for the role of miR-34b in regulating normal cell motility and cytokinesis

    Core column prediction for protein multiple sequence alignments

    No full text
    Background: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. Results: We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment's accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner's scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.University of Arizona by US National Science Foundation [IIS-1217886]; Carnegie Mellon University by NSF [CCF-1256087]; NSF [CCF-131999]; NIH [R01HG007104]; Gordon and Betty Moore Foundation [GBMF4554]; University of Arizona Open Access Publishing FundOpen Access Journal.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Learning Parameter Sets for Alignment Advising

    No full text
    While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. We consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice
    corecore